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Being able to track dependencies between syntactic elements separated by other 
constituents is crucial for language acquisition and processing (e.g., in subject-noun/verb 
agreement). Although long assumed to require language-specific machinery, research 
on statistical learning has suggested that domain-general mechanisms may support 
the acquisition of non-adjacent dependencies. In this study, we investigated whether 
individuals with specific language impairment (SLI) — who have problems with 
long-distance dependencies in language — also have problems with statistical learning of 
non-adjacent relations. The results confirmed this hypothesis, indicating that statistical 
learning may subserve the acquisition and processing of long-distance dependencies in 
natural language. 
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INTRODUCTION 

In order to correctly interpret a sentence, a language user must 
often keep track of syntactic dependencies that span across many 
unrelated words. In English, for example, linguistic material 
may intervene between auxiliaries and inflectional morphemes 
(e.g., is cooking) or between subject nouns and verbs in number 
agreement (the books on the shelf are dusty). More complex rela- 
tionships among surface forms are found in long-distance rela- 
tionships between antecedents and gaps, such as in w/i-questions 

(e.g., Who did you see ?), anaphoric reference (e.g., John went 

to the store where he bought some milk) and embedded clauses 
(e.g., The buildings ^ that the architect builti were\ tall; where 
the subscripts indicate dependency relations). Such discontinu- 
ous dependencies are considered to be a fundamental and unique 
property of human language (Tallerman et al., 2009). Indeed, 
the presence of such non-adjacent relationships in language was 
a major stumbling block (cf. Chomsky, 1959) for early asso- 
ciationist approaches to syntax (e.g., Skinner, 1957). But does 
this mean that non-adjacent dependencies cannot be acquired by 
domain-general means? 

Although much statistical learning research has focused on 
the detecting dependencies between adjacent linguistic elements 
(Gomez and Gerken, 2000; Saffran, 2003, for reviews), relatively 
little research has focused on the learning of non-adjacent syntac- 
tic relationships. A key exception is recent work indicating that 
statistical learning of non-adjacent dependencies improves as the 
variability of elements that occur between two dependent items 
increases (Gomez, 2002; Onnis et al., 2003, 2004). When the set of 
items participating in the dependency is small relative to the set of 
intervening elements, the non-adjacent dependencies stand out as 
invariant structure against the changing background of more var- 
ied material. In addition, statistical learning of non-adjacencies 
has been demonstrated both for non-linguistic sounds (e.g., 
Gebhart et al., 2009) and visual stimuli (e.g., Fiser and Aslin, 



2001, 2002; Onnis et al., 2003; Conway and Christiansen, 2006; 
Pacton and Perruchet, 2008), suggesting that such learning was 
supported by domain-general mechanisms. However, an impor- 
tant theoretical caveat remains: it is unclear whether the mech- 
anisms involved in such variability learning are also used for 
non-adjacencies in language. Indeed, the potential relevance of 
statistical learning for understanding syntactic aspects of language 
has been the subject of much debate (e.g., Musso et al., 2003; 
Friederici, 2004 — but see Marcus et al., 2003; de Vries et al, 2008). 
In this paper, we test whether the same mechanism underlying 
variability learning also subserves natural language learning. This 
hypothesis will be tested by investigating whether individuals with 
SLI, who have well-attested difficulties with long-distance depen- 
dencies (e.g., Clahsen et al., 1997; Wexler, 2000; van der Lely 
and Battell, 2003), also have problems using variability to learn 
non-adjacent dependencies. 

Children's sensitivity to non-adjacent dependencies in lan- 
guage emerges gradually, with those apparent in the surface 
structure of sentences acquired earlier than more abstract non- 
adjacencies. For example, 18-month-olds are sensitive to vio- 
lations of non-adjacent dependencies between is and -ing in 
comprehension (Santelmann and Jusczyk, 1998), and the use of 
the present progressive morpheme -ing also shows up early in 
production (though initially without the appropriate dependency 
relation to the auxiliary is; Brown, 1973). Children's ability to deal 
with more abstract non-adjacencies comes later. Even after they 
have otherwise mastered subject-noun/verb agreement around 2- 
2.5 years of age, they still produce incorrect w/z-questions with 
agreement violations (such as, *What color is these?; Radford, 
1990). Moreover, children also have problems responding cor- 
rectly to w/j-questions involving a direct object w?/-word and 
a non-copular verb (such as, What did mummy say? to which 
a 21-month-old responded Mummy; Radford, 1990). From age 
3 years and onward, children start to produce sentences of 
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increasing length and syntactic complexity, such as coordinating 
conjunctions and center-embedded sentences in which the main 
clause is interrupted by a relative clause. Production and compre- 
hension errors of embedded relative clauses are still frequent in 
children aged between 3 and 6 years (Gaer, 1969; Cook, 1973). 

The order of acquisition of non-adjacencies in natural lan- 
guage suggest that dependencies governing subject-noun/verb 
agreement and auxiliary/inflectional morpheme relations — 
which primarily involve surface-level cues between functional 
elements — are acquired earlier than non-adjacent dependencies 
involving more abstract constituent relationships, such as those 
found in w?/-questions and embedded relative clauses. Thus, work 
on the statistical learning of non-adjacencies between words 1 has 
focused on non-adjacent dependencies discernable in surface- 
level information. Gomez (2002) and Onnis et al. (2003, 2004) 
exposed adults to artificial languages in which sentences took 
the form of aXd, bXe, and cXf (e.g., pel-wadim-rud). Drawing 
on the observation that certain elements in natural language 
belong to relatively small sets (function morphemes like a, was, 
-s, and -ing), whereas others belong to very large sets (nouns and 
verbs), and the fact that learners must often track key dependen- 
cies between functional elements, the experimenters manipulated 
the size of the set from which the intervening X-elements were 
drawn. The hypothesis was that increasing the variability of the 
middle element would cause learners to steer away from adja- 
cent dependencies (e.g., aX and Xd in the string aXd) and 
instead focus on the non-adjacent a-d relationship. Variability 
was manipulated by drawing X from a set containing 2, 6, 12, or 
24 elements. Participants were then tested on their abilities to dis- 
tinguish sentences from the language (e.g., aXd) from foils (e.g., 
aXe). Counterintuitively, participants acquired the non-adjacent 
dependencies only when the variability of the middle items was at 
its highest (in set-size 24). Note that associations between adja- 
cent elements cannot explain these results because first-order 
conditional probabilities, e.g., P(X|a), decrease as the set size 
of X increases. Hence, participants only learned non-adjacent 
dependencies when adjacent dependencies were least predictable. 
Additional experiments demonstrated that infants as young as 15 
and 18 months of age (Gomez, 2002; Gomez and Maye, 2005) are 
able to use variability learning to discover non-adjacent depen- 
dencies, suggesting that this type of learning is present from at 
least the middle of the second year of life. 

The positive effect of high variability has been replicated 
in several subsequent studies. Misyak and Christiansen (2012) 
obtained significant learning using a set-size of 24 and found 
that individual differences in such learning correlated with offline 
language comprehension of sentences involving embedded rel- 
ative clauses. By incorporating the set-size 24 stimuli within 
a serial-reaction time (SRT) task, Misyak et al. (2010a,b) also 



The learning of non-adjacency relationships between syllables within words 
have yielded mixed results (e.g., Pena et al., 2002; Onnis et al., 2005), though 
evidence of sensitivity to nonadjacent dependencies between phonological 
segments has been found (Newport and Aslin, 2004). Moreover, Onnis et al. 
(2003) have subsequently demonstrated that it is possible to learn dependen- 
cies between non-adjacent syllables within words when the syllabic material 
intervening the dependent syllables is highly variable. 



replicated the effect of high variability. They further found that 
performance on this non-adjacency learning task predicted online 
processing of embedded relative clauses in natural language. More 
generally, it seems, though, that for non-adjacent dependency 
relations to be learnable, some facilitatory factor is necessary, such 
as high variability (as investigated here), phonological or visual 
cues (e.g., de Vries et al., 2012; van den Bos et al., 2012), scaf- 
folded learning (Lai and Poletiek, 2011), or prolonged exposure 
(Udden et al, 2012). Some combination of these facilitatory fac- 
tors are likely to be available in language development, suggesting 
a possible role for statistical learning in guiding the first steps 
of acquisition of not only simple but also the more complex, 
non-adjacent syntactic structures. 

Children with SLI provide an ideal population to test the 
hypothesis that statistical learning and language are supported 
by the same underlying mechanisms. These children present a 
slow development of spoken language that in most cases results 
in long-term restrictions in listening and speaking skills in the 
absence of hearing loss, or other neurodevelopmental disor- 
ders, including autism and mental retardation (Tomblin et al., 
1996). Extensive research has shown that children with SLI have 
considerable difficulties with the grammatical morphology of 
English (e.g., Johnston and Schery, 1976; Gopnik and Crago, 
1991; McGregor and Leonard, 1994; Hadley and Rice, 1996; 
Cleave and Rice, 1997; Bedore and Leonard, 1998) and other lan- 
guages (e.g., Clahsen, 1989; Leonard, 2000) — in particular, with 
grammatical relationships extending across non-adjacent lexi- 
cal elements within and between clauses. These difficulties with 
long-distance syntactic dependencies have been addressed within 
generative grammar perspectives by Wexler's (2000) Unique 
Checking Constraint account of SLI, van der Lely and Battell's 
(2003) representational deficits for long-distance relationships 
theory, and Clahsen et al.'s (1997) agreement-deficit hypothe- 
sis. These accounts have explained the difficulties children with 
SLI have with long-distance dependencies in terms of domain- 
specific grammatical impairments. In contrast, we hypothesize 
that impairments to statistical learning mechanisms support- 
ing variability learning underlie these observed problems with 
non-adjacent dependencies in language. 

Preliminary support for this hypothesis comes from studies 
investigating statistical learning of adjacent dependencies. Evans 
et al. (2009) reported that children with SLI were unable to use 
transitional probabilities between adjacent syllables to identify 
word boundaries. Additional support comes from two studies 
involving a heterogeneous population of college-aged adults with 
a history of language impairment, dyslexia, and/or learning dis- 
abilities (LI/D/LD), for which they have received therapy and/or 
other service. Individuals with LI/D/LD were found to have prob- 
lems not only in using adjacency information to learn word 
patterns generated by a finite state grammar (Plante et al., 2002) 
but also with variability learning of non-adjacent dependencies 
(Grunow et al, 2006). 

Given that statistical learning involves implicit learning of 
probabilistic patterns, research on procedural learning in SLI 
also casts light on our hypothesis. Several SRT studies observed 
poorer learning of sequences of visual patterns in children with 
SLI (Tomblin et al, 2007; Lum et al., 2010, 2011; Hedenius et al, 
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2011). Moreover, Kemeny and Lukacs (2009) reported depressed 
performance by children with SLI on a Weather Prediction Task 
that involves learning probabilistic classification. Thus, whereas 
previous studies point to a possible link between statistical learn- 
ing and language ability, we provide a direct test of the account 
by determining whether individuals with SLI — who have well- 
attested problems with syntactic non-adjacencies — also have 
problems using variability learning to discover such dependencies 
via statistical learning. To this end, we adopted the non-adjacent 
dependency learning task developed by Gomez (2002) and com- 
pared performance of a group of adolescents with SLI to adoles- 
cents with normal language (NL) ability. We predicted that high 
variability of the intervening elements would facilitate the NL 
learners' learning of non-adjacent dependencies, but would not 
aid the SLI learners. 

A second goal of the current study was to gain further 
understanding of the learning processes involved in learning 
non-adjacent dependencies, particularly in individuals with SLI. 
Previous studies have indicated that high variability may not facil- 
itate learning of non-adjacent dependencies in individuals with 
language impairment. This suggests that learners with language 
learning difficulty might exploit a different learning strategy that 
is sensitive to the number of target pairs to learn. It is possible that 
the participants with language learning difficulty learned the sen- 
tence strings exemplar by exemplar without paying attention to 
the structural regularities embedded in the stimuli. In the current 
study, we investigated this hypothesis by examining the accuracy 
of each target non-adjacent pairs separately. 

METHODS 
PARTICIPANTS 

One hundred twenty adolescents aged 13-15 years were recruited 
from a large sample of children who have been participating 
in a longitudinal investigation of SLI (see Tomblin et al., 1997, 
for details of sampling and assessment). Sixty of these adoles- 
cents had NL skills and 60 were age- and non-verbal IQ-matched 
adolescents with specific language impairment (SLI) 2 . The par- 
ticipants from each language group (NL, SLI) were randomly 
assigned to one of three variability conditions: low (X = 2), mid 
(X = 12), and high (X = 24) variability. Two-Way ANOVAs, with 
groups and variability conditions as the between-subject factors, 
were conducted to inspect group differences in non-verbal IQ 
and language abilities between groups across different condi- 
tions. Group summary statistics are provided in Table 1. Each 
of the SLI groups had comparable non-verbal cognition to the 
paired NL groups in terms of Performance IQ on WISC-III 
(Wechsler, 1991; F (h 114) = 0.12, p = 0.74), but showed signif- 
icantly poorer language abilities than the paired NL groups in 
terms of language composite standard scores [P(i : 114) = 137.6, p 
< 0.0001] compiled from CELF-III (Semel et al, 1995), PPVT- 
R (Dunn and Dunn, 1981), CREVT (Wallace and Hammill, 



Participants in both groups were not required to have performance IQ levels 
above 85 in this study. Although this restriction has been common in SLI stud- 
ies, it has recently come under scrutiny (Tager-Flusberg and Cooper, 1999). 
Thus, as shown in Table 1, 25% of the children with SLI had performance IQs 
below 85. Crucially, the two groups differed only on language skills, whereas 
the performance IQ was the same across both groups. 



1997), and the listening comprehension adaptation of the QRI-II 
(Leslie and Caldwell, 1995). Differences in non-verbal cognition 
and language composite scores between the three SLI subgroups 
or between the three NL subgroups were not significant. Scores 
from the Competing Language Processing Task- Word Repetition 
subtest (CLPT-Word Repetition, Gaulin and Campbell, 1994) did 
not serve as a selection criterion but were used to test potential 
effects of working memory on non-adjacency learning. Informed 
consent was obtained from each of the participants before they 
took part in the current study. This research was approved by the 
Institutional Review Board of the University of Iowa. 

MATERIALS 

Following Gomez (2002), the stimuli consisted of three depen- 
dency pairs: aXd, bXe, and cXf. To investigate the role of variability 
in non-adjacency learning, we varied the size of the set from 
which the middle element (X) was drawn: low (X = 2), mid 
(X = 12), and high (X = 24) variability. The beginning (a, b, 
c) and ending (d, e, f) stimulus tokens were instantiated by the 
non-words pel, dak, vot, rud, jic, and tood. The non-words used 
to instantiate the 24 intervening X-tokens in the high-variability 
conditions were wadim, kicey, puser, fengle, coomo, loga, gople, 
taspu, hiftam, deecha, vamey, skiger, benez, gensim, feenam, lael- 
jeen, chila, roosa, plizet, balip, malsig, suleb, nilbo and wiffle. The 
X-tokens for the low- and mid-variability conditions consisted 
of the first 2 and 12 non-words, respectively, from this set. Each 
non-word was recorded separately by a female native speaker of 
English to ensure that lexical stress was similar for all mono- 
syllables and all disyllables. The assignment of particular tokens 
(e.g., pel) to particular stimulus variables (e.g., the b in bXe) for 
each participant was randomized to avoid learning biases due to 
specific sound properties of the non-words (Onnis et al., 2005). 
There was a 250-ms pause between each word in a string, and a 
750-ms pause between strings. 

Frequency of exposure to the dependency pairs (i.e., aXd, bXe, 
and cXf) was held constant across the three variability conditions, 
allowing for comparisons of learning in the three variability con- 
ditions. The training stimuli consisted of 144 presentations of 
each dependency pair, randomly interleaved, for a total of 432 
training strings. The test material included 6 instances of the orig- 
inal training strings (two each of aXd, bXe, and cXf) and 6 foils 
produced by disrupting the non-adjacency relationship (two each 
of * aXe,*bXf, and *cXd). 

PROCEDURE 

Twenty participants from each language group (NL, SLI) were 
randomly assigned to one of the three variability conditions. 
They were instructed to listen to sequences of non-sense syllables, 
the knowledge of which they would later be tested. The partici- 
pants were not informed about any rules or patterns embedded 
in the materials 3 . After training participants were informed that 
the syllable sequences they heard were generated according to 



3 We used the same instructions as in Gomez (2002): "Your task is to listen 
to sequences of non-sense syllables. We will test you later so pay close atten- 
tion. This phase of the study takes about 20 min, divided into 3 parts. You can 
take a break after each part. Please let the experimenter know if you have any 
questions." 
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Table 1 | Group summary statistics for the adolescents with specific language impairments (SLI) and with normal language (NL) in the low 
(X = 2), mid (X = 12), and high (X = 24) variability conditions. 







Age 


ULr 1 


lA/icr* nib 

VVlOLr-lll 


Language composite score 


X = 2 


NL 


13;9 (0.6) 


72.8(13.6) 


95.1 (11.0) 


97.6 (10.1) 




I 
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7fi 7 (fi Rl 


X = 12 


NL 


14;2 (0.5) 


73.4 (12.3) 


94.3 (13.3) 


96.8 (12.6) 




SLI 


14;1 (0.6) 


62.2 (11.3) 


93.40 (12.6) 


77.01 (7.5) 


X = 24 


NL 


14;3 (0.6) 


74.8 (8.4) 


95.9 (12.2) 


96.2 (11.0) 




SLI 


13;8 (0.4) 


58.6 (13.8) 


95.5 (10.4) 


76.9 (6.2) 


Total 


NL 


14;1(0.6) 


73.7 (11.4) 


95.1(12.0) 


96.9(11.1) 




SLI 


14;0 (0.6) 


59.1 (12.2) 


94.3(11.8) 


76.9 (6.8) 



CLPT, WISC-III, language composite scores: standard scores with a mean of 100, standard deviation of 15. 

"Competing Language Processing Task-Word Repetition subtest. 

b Wechsler Intelligence Scale for Children (3rd edition)-Performance IQ. 

Standard deviations are reported in parentheses. 

rules specifying word order and asked to provide grammaticality 
judgments for the test items by pressing a Y (Yes for grammat- 
ical strings) or a N (No for ungrammatical strings) key on the 9( 
keyboard. 8( 



RESULTS 

OVERALL PERFORMANCE 

The overall mean accuracy scores of the SLI and the NL groups in 
each of the three variability conditions is shown in Figure 1 . There 
were a total of 12 test items, of which 6 contained grammatical 
strings and 6 ungrammatical strings. 

A list of each participant's performance in terms of hit (i.e., 
the proportion of endorsements for grammatical items) and false- 
alarm (i.e., the proportion of endorsements for ungrammatical 
items) rates is provided in the Table Al 4 . Table 2 presents group 
mean accuracy of hits and false alarms for the SLI and the 
NL groups. First, we inspected response bias (P) across groups 
and variability conditions. We found that group difference in |3 
were not significant in any one of the three variability condi- 
tions 0(38) = -0.70, p = 0.46 in X = 2; f (38) = 0.80, p = 0.43 
in X = 12, t (38 ) = 1.43, p = 0.16 in X = 24]. Given that the 
two groups did not show different response biases, the partic- 
ipants' performance was evaluated statistically using a mixed 
design ANOVA with language group (NL vs. SLI) and vari- 
ability condition (X = 2, X = 12, X = 24) as between-subjects 
variables and grammaticality (grammatical vs. ungrammatical 
strings) as a within-subjects variable. There was a significant main 
effect of grammaticality, Fq 114 ) = 11.43, p = 0.001, partial 



4 Using the criteria of hit rate at or below 0.33 and a false-alarm rate at or 
above 0.66, we found that in the high variability condition there were five 
participants with SLI but only one participant with NL who showed a low 
rate of hits but high rates of false alarms. Crucially, such group difference was 
not found in the other two variability conditions (only 1 participant from 
each group in the X = 2 condition, and 2 from the NL and 3 from the SLI 
groups in the X = 12 condition). Therefore, the observed group difference 
in the X = 24 condition is unlikely to derive from overall confusion about 
performing the task, but instead might reflect general difficulty in learning 
nonadjacent pairs in the learners with SLI. 




Mid 

Variability 



FIGURE 1 | Mean accuracy for the NL and the SLI group in the low, 
mid, and high variability conditions. Error bars represent s.e.m. 



r| 2 = 0.09, and Grammaticality x Language Group interaction, 
F(i : n4) = 6.34, p = 0.01, partial r| 2 = 0.05. There were no 
other main effects or interactions. Post-hoc comparisons indicated 
that overall the NL learners accepted grammatical strings more 
frequently than they accepted ungrammatical items [£(59) = 3.97, 
p < 0.001, d = 0.80]. However, this pattern of performance was 
not observed in learners with SLI. 

We predicted that high variability of the intervening elements 
would facilitate the NL learners' learning of non-adjacent depen- 
dencies, but would not aid the SLI learners. To test this prediction, 
we conducted a series of planned comparisons to examine the 
rates of acceptance of grammatical strings against ungrammat- 
ical strings for the two groups in each of the three variability 
conditions. There was a significant grammaticality effect with 
a large effect size for the NL learners exposed to high variabil- 
ity [£(19) = 3.01, p = 0.007, d = 1.06]. In addition, a significant 
grammaticality effect with a moderate effect size was observed 
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Table 2 | Participants' responses in terms of hit and false-alarm rates for the NL and the SLI groups. 

Set size NL SLI 

Hit False positive Difference Hit False positive Difference 



2 0.75(0.05) 0.59(0.05) 0.16 0.68(0.04) 0.58(0.03) 0.10 

12 0.70(0.6) 0.57(0.07) 0.13 0.61(0.06) 0.61(0.06) 0.00 

24 0.79 (0.06) 0.49 (0.06) 0.30 0.60 (0.08) 0.61 (0.07) -0.01 
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FIGURE 2 | Scatter plots of language composite z scores and the 
difference scores between hit and false alarm for the (A) low, (B) mid, 
and (C) high variability conditions. 



The numbers in parentheses represent standard errors of the mean. 

for the NL learners exposed to low variability [t(i9j = 2.14, p = 
0.046, d = 0.65]. The decrease in effect size suggests that high 
variability best facilitates learning of non-adjacent dependencies. 
In contrast, performance by the learners with SLI did not reach 
significance in any variability condition. Together, the results sug- 
gest that high variability facilitates non-adjacent dependencies 
learning for NL learners, but not learners with SLI. 

Might there be a correspondence between individual dif- 
ferences in learning non-adjacent dependencies and individual 
variations in language skills across the two groups? Specifically, 
if high variability is critical for detecting and learning depen- 
dent relationships between remote items, we might expect to 
see an association between the participants' language skills and 
their performance in the high variability conditions. Simple cor- 
relations (Pearson's r) were calculated between the participants' 
language composite scores and the difference scores between 
correct acceptance and false positives in the non-adjacent depen- 
dency learning task. As shown in Figure 2, a significant, albeit 
modest, correlation was found for high variability (r = 0.44, 
p = 0.004), indicating a positive relationship between the abil- 
ity to learn non-adjacent dependencies under high variability and 
language attainment. Non-significant correlations were obtained 
for the other two variability conditions. Moreover, the difference 
scores in the high variability condition were not significantly cor- 
related with individual differences in working memory measured 
with CLPT. 

ITEM-SPECIFIC LEARNING IN SLI 

Although the SLI participants as a group did not show evidence of 
learning in any variability condition, it remains possible, though, 
that some non-adjacent pairs were learned by SLI learners, but 
that the aggregate across items was not great enough to show a 
significant learning effect. We therefore calculated the number of 
non-adjacent word pairs (max = 3) that each participant learned. 
A given non-adjacency pair is considered "learned" if a learner 
was able to correctly accept all grammatical and reject all ungram- 
matical strings involving this pair (i.e., hit rate = 100% and false 
positive = 0%). This scoring method allows us to examine item 
specific learning that might be obscured by the aggregate score. 
Furthermore, such item specific learning may benefit more from 
low variability where fewer items need to be learned. 

Figure 3 shows the percentage of participants who learned 
at least one non-adjacent pair in each group and variabil- 
ity condition. Interestingly, the proportion of the SLI partic- 
ipants who learned at least one non-adjacent pair under low 
variability was slightly higher than that under high variability, 



while the opposite was true for the NL group. The find- 
ing that more participants with SLI benefitted from low 
than high variability in learning non-adjacency pairs suggests 
item-specific learning: in the low variability condition there were 
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FIGURE 3 | Percent participants in the NL and the SLI group learned at 
least one non-adjacent pairs in the low, mid, and high variability 
conditions. 
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FIGURE 4 | Correlations between number of pairs learned and 
language ability for the pallicipants with SLI in the low, mid, and high 
variability conditions. 



6 different strings (pel-wadim-rud, pel-kicey-rud, dak-wadim- 
jic, dak-kicey-jic, vot-wadim-tood, vot-kicey-tood), each of which 
occurred for 72 times (i.e., high token frequency), whereas in 
the high variability condition there were a total of 72 differ- 
ent strings, with each string occurring only 6 times (i.e., low 
token frequency). We further explored this suggestion by exam- 
ining correlations between language and performance in learning 
non-adjacent dependencies. A list of each participant's language 
composite score and number of non-adjacent item mastered is 
provided in the Table A2. Because there were participants who 
did not reach 100% accuracy on any of the three non-adjacent 
pairs, Spearman's rank correlation coefficient was used to min- 
imize the effect of extreme scores. Strikingly, as illustrated in 
Figure 4, there was a significant correlation for the SLI group 
under low variability (p = 0.72, p < 0.0001). That is, within the 
SLI group, those who had better language ability learned more 
pairs under low variability than those who had poorer language 
ability. No correlations were found for mid and high variability. 
For the NL group, the correlation coefficients in all three con- 
ditions were negative, only just reaching significance in the mid 
variability condition (p = — 0.47,p = 0.04). Thus, for NL partic- 
ipants, better language ability was not associated with mastering 
non-adjacency pairs — indeed, there was a trend in the opposite 
direction. 

DISCUSSION 

The current study investigated variability learning of non- 
adjacent dependencies in adolescents with and without SLI. For 
the adolescents with NL ability, both those exposed to the high 
and low variability conditions showed an effect of learning, but 
the relative effect sizes suggest that high variability best facilitates 
learning of non-adjacent dependencies. It is possible that for the 
NL learners, repeated exposure to a few unique exemplars in the 
low variability condition could also assist learning. Importantly, 
though, the correlation analyses showed that only performance 



under high variability was associated with an individual's 
language skills. For the learners with SLI, on the other hand, 
performance in the high variability condition did not reach sig- 
nificance. Thus, although both infants and typically-developing 
adults are able to use variability learning to detect non-adjacent 
dependencies in speech input (Gomez, 2002; Onnis et al., 2003; 
Grunow et al, 2006), the SLI group was unable to do so. 

The same-mechanism hypothesis predicts an association 
between the participants' language skills and performance in 
the non-adjacent task. In the current study, we found a signifi- 
cant, albeit modest, correlation between the two variables in the 
high variability condition. That the association was only mod- 
erate might reflect the fact that the participants' language skills 
were evaluated using composite scores that pooled across several 
standardized language tests, rather than using tests specifically 
designed for evaluating syntactic performance on non-adjacent 
structures in English. Future studies should use tests that more 
directly examine individuals' proficiency in non-adjacent struc- 
tures in their native language (e.g., as in Misyak et al., 2010a,b). 

Why did the SLI participants fail to show learning under 
conditions for which their NL peers did learn? Analyzing the 
dependency-pair mastery scores, we found different group pro- 
files across the three variability conditions. For the NL group, 
high variability of the intervening elements led to the best mastery 
scores. In contrast, more non-adjacent pairs were learned by SLI 
adolescents under low variability than high variability, suggest- 
ing that perhaps different types of learning, involving different 
kinds of statistics, were adopted by the two groups in learning 
non-adjacent word pairs. 

One possible interpretation of the observed difference in 
learning pattern is that the adolescents with SLI might have 
attempted to learn the materials by rote memorization. Given 
that the low variability condition only involves 6 individual 
strings, each presented 72 times, whereas the high variability con- 
dition incorporated 72 separate strings, each presented only 6 
times, such an approach would seem reasonable. However, given 
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that typically-developing adults are able to generalize to novel 
strings — even when exposed to a zero variability condition with 
only 3 unique strings — statistical learning of non-adjacencies is 
unlikely to involve memorization under normal circumstances 
(Onnis et al., 2004). In contrast, the SLI group may have sought to 
memorize the strings, consistent with evidence that children with 
SLI rely substantially on memorized surface properties in spon- 
taneous speech (e.g., Jones and Conti-Ramsden, 1997; Riches 
et al, 2006). Thus, the correlation we found between number 
of adjacency pairs learned and language ability may suggest that 
memorization of input chunks as unanalyzed wholes may provide 
some advantages as a compensatory strategy for language learn- 
ing and processing, even though it may impede statistical learning 
of more complex aspects of language, including non-adjacent 
dependencies. 

Tracking remote dependencies is a crucial for language acqui- 
sition. In this study, we have shown that the well-documented 
problems that individuals with SLI have with long-distance syn- 
tactic dependencies may be associated with their inability to take 
advantage of variability in statistical learning. Given that statis- 
tical learning of non-adjacencies has been demonstrated both 
for non-linguistic sounds (e.g., Gebhart et al., 2009) and visual 
stimuli (Onnis et al, 2003; Pacton and Perruchet, 2008), the SLI 
participants' problems with the non-adjacency learning task may 
reflect an impairment of domain-general mechanisms hypothe- 
sized to play an important role in the acquisition and processing 
of discontinuous dependencies in natural language. In typically- 
developing individuals, these mechanisms allow learners to use 
additional cues to acquire both probabilistic non-adjacencies 
(van den Bos et al., 2012) as well as multiple overlapping non- 
adjacent dependencies (de Vries et al., 2012). More generally, this 
study contributes to our emerging understanding of the interre- 
lationship between statistical learning and language in typically- 
developing populations (e.g., Misyak et al., 2010a,b; Misyak and 
Christiansen, 2012), while underscoring the need for additional 
research on the possible role of statistical learning deficits in SLI. 
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APPENDIX 

Table A1 | Hit and false-alarm rate for each participant in the three conditions. 
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'One data point was missing for this participant due to program failure. Only responses to 11 test items were recorded for this participant 

Note: Correction rejections rate (i.e., ungrammatical strings judged as ungrammatical) = 1— false positive rate; Miss rate (i.e., grammatical strings judged as 
ungrammatical) = 7— hit rate. 
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Table A2 | Language composite scores and number of non-adjacent pairs learned (100% accuracy) for each participant in the three conditions. 
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a 8th grade language composite z scores. 

b Number of items (max = 3) responded to with 100% accuracy. 
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